Skip to content

Adds information about cooldown periods for trained model autoscaling in Serverless #2498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kosabogi
Copy link
Contributor

@kosabogi kosabogi commented Aug 11, 2025

This PR adds information about cooldown periods for trained model autoscaling in serverless projects.

Changes

Related issue: https://github.com/elastic/docs-content-internal/issues/177

@kosabogi kosabogi requested a review from ppf2 August 11, 2025 12:15
@kosabogi kosabogi requested a review from a team as a code owner August 11, 2025 12:15
@kosabogi kosabogi added the documentation Improvements or additions to documentation label Aug 11, 2025
Copy link

github-actions bot commented Aug 11, 2025

Copy link
Contributor

@kilfoyle kilfoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🦖
Very nice!

@ppf2
Copy link
Contributor

ppf2 commented Aug 14, 2025

@prwhelan Can you review for technical accuracy? Thx!

* When using the inference API for {{es}} or ELSER, [enable `adaptive_allocations`](../../autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations).

::::{note}
In {{serverless-short}}, trained model deployments scale down to zero only after 24 hours without any inference requests. After scaling up, they remain active for 5 minutes before they can scale down again. During these cooldown periods, you will continue to be billed for the active resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true outside of serverless as well. All environments will now wait 24 hours before scaling to zero: elastic/elasticsearch#128914

Outside of serverless, this can be modified using xpack.ml.trained_models.adaptive_allocations.scale_to_zero_time to a minimum of one minute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @prwhelan, thanks a lot for your feedback! I've modified my PR based on it, along with a few other smaller changes:

  • Trained model autoscaling: I moved the cooldown period information into its own heading. This makes it easier to highlight and also allows other pages to link directly to this specific section.

  • Autoscaling: I felt that going into the details of cooldown periods here would be out of scope and make the page a bit overwhelming. Instead, I added a more concise sentence that links to the new Cooldown periods section on the Trained model autoscaling page.

  • Elasticsearch billing dimensions: Realizing that this page is only applicable to Serverless, I updated the description for the Machine learning trained model autoscaling bullet point to reflect the new autoscaling behavior in Serverless.

Please let me know if you think these changes are appropriate or if you’d like me to adjust anything.
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants